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Joinpoint regression is used to determine the number of segments 
needed to adequately explain the relationship between two variables. 
This methodology can be widely applied to real problems, but we fo- 
cus on epidemiological data, the main goal being to uncover changes 
in the mortality time trend of a specific disease under study. Tradi- 
tionally, Joinpoint regression problems have paid little or no attention 
to the quantification of uncertainty in the estimation of the number of 
change-points. In this context, we found a satisfactory way to handle 
the problem in the Bayesian methodology. Nevertheless, this novel 
approach involves significant difficulties (both theoretical and practi- 
cal) since it implicitly entails a model selection (or testing) problem. 
In this study we face these challenges through (i) a novel reparam- 
eterization of the model, (ii) a conscientious definition of the prior 
distributions used and (iii) an encompassing approach which allows 
the use of MCMC simulation-based techniques to derive the results. 
The resulting methodology is flexible enough to make it possible to 
consider mortality counts (for epidemiological applications) as Pois- 
son variables. The methodology is applied to the study of annual 
breast cancer mortality during the period 1980-2007 in Castellon, 
a province in Spain. 
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1. Introduction. Joinpoint regression is a statistical modeling technique 
that explains the relationship between two variables by means of a segmented 
linear regression constrained to be continuous everywhere, in particular, 
in those places where the slope of the regression function changes. This 
technique is widely applied to the modeling of time trends in mortality or 
incidence series in epidemiological studies. In these applications the number 
(if any) and the location of the changes in trends (known as change-points or 
joinpoints) is usually unknown, the main goal being to assess their existence 
and determine their location. Joinpoint regression is applied in a wide variety 
of contexts. Nevertheless, for clarity of presentation of the main ideas in this 
paper, the above-mentioned epidemiological setting is assumed throughout 
the paper. 

The underlying problem in this context is a model selection (or testing) 
problem, with uncertainty about which is the model (number of change- 
points) that (most likely) produced the data observed. In this paper we 
approach the problem from a Bayesian perspective which is fully detailed 
below. Nevertheless, the predominant techniques in the context of Join- 
point regression are frequentist. Within them, the main goal is to find "the 
smallest number of joinpoints such that, if one more joinpoint is added, the 
improvement is not statistically significant" [Statistical Research and Appli- 
cations Branch, National Cancer Institute (2009)]. This goal is usually met 
by means of nonparametric permutation tests [Kim et al. (2000)], the final 
result of that analysis being the determination of the model that meets the 
former condition and the estimation of its parameters and their variability. 
The National Cancer Institute (NCI) of the United States has developed 
a tool to carry out these kinds of analyses [Statistical Research and Appli- 
cations Branch, National Cancer Institute (2009)]. This software has become 
a standard tool in epidemiological literature; see, for example, Cayuela et al. 
(2004), Bosetti et al. (2005), Stracci et al. (2007), Karim-Kos et al. (2008), 
Bosetti et al. (2008), Qiu et al. (2009). 

The proposed permutation test approach has, in our opinion, two main 
limitations: 

(i) The underlying model selection criterion selects the simplest model 
such that if a new joinpoint is added, it does not yield a statistically sig- 
nificant improvement. Therefore, the model selected is not the most likely 
one, but it is chosen according to both its capacity to describe the time 
trend within the data and the informativeness of data to highlight tempo- 
ral changes (in the case of mortality or disease incidence time series, this 
last feature will depend heavily on the average number of annual observed 
cases in the data set). Hence, in a situation when the data is not very infor- 
mative, the criteria is clearly biased (by definition) toward more simplistic 
models. This conservative behavior of the permutation procedure could seem 
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reasonable to some authors [Kim, Yu and Feuer (2009)], but we find this sys- 
tematic tendency unsatisfactory. Instead, in this situation we would expect 
to be "informed" that a number of alternative models are equally plausible. 

(ii) It is hard to quantify to what extent one selected model is more likely 
than others. In practice, a single model is chosen, with a fixed number of 
joinpoints, regardless of whether the choice is much more likely or not than 
the other alternatives. As a consequence, the main goal of the inference 
in these kinds of models (how many joinpoints can adequately explain the 
time trend that we are observing?) lacks an estimate of its variability in 
contrast to the remaining parameters of the selected model. This is because 
the asymptotics of the number of joinpoints is an involved issue. Relevant 
advances in this regard include the work of Yao (1988), Liu, Wu and Zidek 
(1997), Kim, Yu and Feuer (2009), focusing mainly on conditions where 
estimators of this parameter are consistent (converge to the parameter as 
the sample size increases). 

There is some previous work devoted to the application of Bayesian ideas 
to Joinpoint problems. Carlin, Gelfand and Smith (1992) is one of the pi- 
oneering contributions in this area, proposing the application of MCMC 
methods to fit these kinds of models. Moreno, Torres and Casella (2005) de- 
rived the intrinsic priors for a possibly heteroscedastic normal model under 
the assumption of a fixed change-point. In a similar problem but under ho- 
moscedasticity, more recently, Bayarri and Garcfa-Donato (2007) explicitly 
derived the Zellner-Siow priors [Zellner and Slow (1980); Zellner (1984)]. 
Bearing in mind its epidemiological application, Tiwari et al. (2005) de- 
scribe the calculus of Bayes factors for model selection in Gaussian joinpoint 
models. A common limitation of the application of all these studies to epi- 
demiological time series modeling is the assumption of normal errors. This 
restriction is relaxed in the work of Ghosh, Basu and Tiwari (2009), who 
propose semiparametric regression models by means of Dirichlet processes. 
Nevertheless, the original data in mortality studies are usually the annual 
observed death counts, for which in this paper we assume a Poisson regres- 
sion model to take into account the discrete nature of these data. As far as 
we know, the only Bayesian model selection approach that considers Poisson 
counts in the context of Joinpoint regression is Ghosh et al. (2009). These 
authors also acknowledge the advantage of the Poisson assumption especially 
when the observed counts are smaller, in which case the Gaussian modeling 
of incidence or mortality rates is clearly not convenient. That is probably the 
work most closely related to ours. Nevertheless, our approach differs from 
it in at least two main ways. First, Ghosh et al. (2009) model the hazard 
rate in a context of relative survival, while we propose a model for the, more 
usual in this context, incidence or death rates (see Section 3.1). Second, 
the model selection performed in Ghosh et al. (2009) is based on popular 
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model selection criteria like CPO and DIG, but not on posterior probabil- 
ities as we do. The advantages of posterior probabilities over other criteria 
are a straightforward interpretation and the richness of the results produced 
(e.g., to produce predictions). These are put in practice in Section 4. 

The Bayesian approach is straightforward at least conceptually. Bayes fac- 
tors allow the selection of models from among several alternatives, strictly 
according to their posterior probabilities. Furthermore, through Bayes fac- 
tors it is possible not only to select one of the models entertained, according 
to the evidence (posterior probability) provided by the data, but also to 
quantify the difference between the one selected and the remaining compet- 
ing models. In a broad sense, Bayes factors make it possible to evaluate the 
uncertainty involved with the selection made. Furthermore, their use is the 
basis for what has been called Model Averaging [Clyde (1999)], under which 
it is possible to average the fit of all the models weighted by their posterior 
probabilities. Therefore, uncertainty in the selection of the "correct" model 
is propagated to the inferential exercise. 

The goal of this study is to propose a Joinpoint regression modeling 
that evaluates and incorporates the uncertainty in both model selection and 
model parameters into the analysis. We found the Bayesian approach, for 
the aforementioned reasons, a very appealing way of doing so. Of course, 
the Bayesian approach to model selection problems is not free of difficulties, 
as nicely explained in Berger and Pericchi (2001). These can basically be 
summarized in two main problems: a strong infiuence of the prior on the 
results and a very challenging numerical problem to compute these results. 
Much of the material presented in this paper focuses on how we manage to 
overcome these problems. 

The rest of the paper is organized as follows: Section 2 introduces a repa- 
rameterization of the usual Joinpoint regression model which will be really 
convenient as a first step to assign prior distributions. In Section 3, starting 
from the reparameterization proposed in Section 2, we introduce a Join- 
point modeling proposal with an unknown number of change-points. That 
proposal will be carried out as a variable selection process [George and Mc- 
Culloch (1993); Dellaportas, Forster and Ntzoufras (2000, 2002)], and prior 
distributions will be discussed in detail in order to get reasonable results 
from the previous model selection problem. In Section 4 our new model will 
be applied to the study of breast cancer mortality in the Spanish province of 
Castellon to illustrate the possibilities of the Bayesian approach for exploit- 
ing the results from the inference. Finally, in Section 5 we will summarize the 
main advances of our model and some future lines of work will be outlined. 

2. A convenient parameterization of the joinpoints. Suppose that we 
want to describe the behavior of the variable {Yi;i = 1, . . . , n} as a function of 
the explanatory covariate {ti,i = 1, . . . ,n}. We find it convenient to assume 
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that ti represents time, although it could represent any other magnitude. It 
is usual to model the presence of J change-points (locations in which the 
functional describing the relationship between Y and t changes) through the 
expected value of the dependent variable as 

J 

(1) 9{E{Yi\ti)) = a + fio-ti + Y,Pj-{ii- ^3)^ 

i=i 

for certain a linking function. In this equation, Tj represents the location 
of the jth change-point and {6)'^ is if > 0, and otherwise. We label the 
model in (1) as Mj, to make it explicit that it contains exactly J change- 
points. Similarly, we call the model with no change-points Mq. 

In what follows, we assume that the maximum number of change-points 
is J*, a number which is fixed (more on this aspect later). Then the problem 
that we face is to find the posterior probability of each of the models in 
{Mq, Ml, . . . , Mj* } and in the case that a single model needs to be selected, 
the one with the highest posterior probability is preferred. 

In the Bayesian framework, the distinction between common and new 
parameters for the assignment of prior distributions [first used by Jeffreys 
(1961)] is crucial. The terminology is very intuitive: common parameters 
appear in all the competing models while the new parameters are model- 
specific. In the problem above, a and /3o are common and the remaining /3's 
are new parameters. 

As we fully describe in the following section, the scheme that we adopt for 
the assignment of the priors proposes the same marginal (noninformative) 
prior for the common parameters. Clearly, using the same prior (either sub- 
jectively elicited or not) for common parameters does not make sense unless 
the meaning of these parameters does not change throughout the different 
models entertained. Unfortunately, common parameters do not (in general) 
represent the same magnitude, this being a major difficulty in Bayesian 
model selection [see Berger and Pericchi (2001)]. Although it is hard to es- 
tablish precise conditions under which common parameters have the same 
meaning (a notion which can be viewed as subjective), it is clear that, as it 
stands, this is not the case for the problem presented above. For instance, 
in Mq, a and /3o represent the parameters explaining the global trend of 
the series. Nonetheless, in Mi (a model with one joinpoint) a and /3o are 
the intercept and the slope respectively of a line adjusted up to the change- 
point (ti), /3o + /3i being the slope from the joinpoint on. We propose an 
alternative parameterization, for which we argue that the hypothesis of com- 
mon parameters with a same meaning across models holds reasonably. More 
concisely, let 

J 

(2) E{Y, \ti) = a + /3o • (ti - t) + ^ PjBr^ (U), 
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where Brj{t), which we cah the break-point or the break-point centered at tj, 
is defined as the following piecewise linear function: 



restricted to a number of conditions which are fully specified below. What 
we finally obtain is, conditioned on the joinpoints, a known function of the 
original regressors at times {ti,t2, ■ ■ ■ ,tn} and describing a peak at the mo- 
ment Tj . Alternatively, the set of break-points can be interpreted as a base 
of continuous piecewise linear functions generating the joinpoints needed to 
describe the time trend in the data. The linear component in (2) has been in- 
troduced as (ti — t) (instead of ti) to avoid the dependency between a and /3o. 
The conditions imposed on Brj (t) are as follows: 

• lim^_^_^- (t) = lim^^^+ Brj (t) , that is, Brj{t) is a continuous function. 

Hence, it is guaranteed that the regression function (2) is continuous all 
around, independently of the number of break-points. 

• Sr=i (ti) = 0, that is, the sum of the elements of the break-point evalu- 
ated in all the points observed must be zero. This way the addition of any 
break-point in the model would not alter the mean value of the regression 
function and it would not change the meaning and the estimation of pa- 
rameter a across models. In other words, this condition can be understood 
as if the new break-points are imposed to be geometrically orthogonal to 
the intercept term. 

• Y17=i iU) •ti = 0, that is, the slope of the break-points along the whole 
period of study is zero. This way the addition of any break-points in the 
model would neither alter the slope of the regression function nor would it 
change the meaning and the estimate of parameters /3q across models. In 
other words, this condition can be understood as if the new break-points 
are imposed to be geometrically orthogonal to the slope term. 

• Brj (Tj) = 1, so that the parameter /3j has the role of measuring the magni- 
tude of the break-point in the location where the change in the tendency 
takes place. Hence, f3j has to be interpreted as the value of the devia- 
tion produced at tj as a consequence of including this break-point in the 
model. Without a restriction of this kind the value of (3j would not be 
identifiable. 

Subject to these conditions and given tj , the function Br^ (t) is unambigu- 
ously determined (details are provided as supplementary material [Martinez- 
Beneito, Garcia-Donato and Salmeron (2011)]). 

The main motivation for introducing the above basis of functions is to im- 
prove robustness over prior specification by reparameterization as is further 
explained below. In this sense, our approach is different from the other basis 




yt<Tj, 
yt>Tj, 
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Fig. 1. Left: Shape of different break-pomts as a function of their change-point locations. 
Right: Trends with (solid line), 1 (dashed line) and 2 (dotted line) change-points. All 
these functions have the same mean value and slope for the whole period. 

of functions used [like MARS in Friedman (1991) and Nott, Kuk and Due 
(2005)] where the aim was to approximate surfaces which could potentially 
be highly nonlinear on their multiple arguments. With this parameteriza- 
tion, it is now reasonable to assume that the common parameters have the 
same meaning: in all models, a and /3o are parameters of a line representing 
the global trend. The remaining parameters are used to modify this com- 
mon line to incorporate changes in the trend without changing the original 
meaning of a and /3o in Afo- 

As an illustration, a graphical representation of several break-points, cor- 
responding to different locations of the change-points (for 20 observed val- 
ues), is shown in Figure 1 (left). As can clearly be seen, all break-points 
are equal to 1 at their corresponding change-points, are zero on average 
and their slope is also null. Following this same context. Figure 1 (right) 
shows how the inclusion of break-points modifies a straight line common 
to all models (a + /3o ■ t). In this figure we have plotted the line y{t) = 
10 + 0.1 • t, jointly with the functions y{t) = 10 + 0.1 • t + 0.5 • B^it) and 
y{t) = 10 0.1 • t 0.5 • B^{t) - 0.3 • Bu{t). The effect of including break- 
points over a common regression function is the torsion of this function, 
instead of a divergence from a certain place as was done in the parameter- 
ization in (1). In other words, the effect of the introduced break-points is 
a global modification of the regression curve, as opposed to the previous pa- 
rameterization which produced a local modification, changing the meaning 
of the parameters a and /3o depending on whether the break-points are (or 
are not) included in the model and how many of them are included. 

Additionally, a number of nice side effects are derived from this parame- 
terization: 
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• With the definition of the break-points, the columns of the design matrix 
corresponding to the common parameters are orthogonal to the columns 
of the new parameters. Hence, we can reasonably expect the Fisher in- 
formation to be approximately block diagonal (of course, it depends on 
the particular distribution assumed for Y , and will be true for the normal 
case). In this scenario, it is known that the Bayes factors (and consequently 
any other result derived from them, like the posterior probabilities) are 
quite robust to the prior distribution used for common parameters [see 
Jeffreys (1961); Kass and Vaidyanathan (1992)]. This has been used as 
a justification for the strategy of using noninformative priors, possibly 
improper, for common parameters [see Liang et al. (2008); Bayarri and 
Garcia-Donato (2008)]. We use a similar approach, as we explain in the 
next section. 

• With this reparameterization, the common parameters a and /3o conserve 
their meaning regardless of the model (i.e., regardless of the number of 
joinpoints). This allows us to make inferences about them in a model- 
averaged way, that is, taking into account the uncertainty regarding which 
is the true model. We put this into practice in Section 4 where we draw 
inferences on these parameters. 

3. A Joinpoint regression model with an unknown number of change- 
points. 

3.1. An encompassing model. To address the question of how many join- 
points are needed, or, equivalently, to select from among {Mq, Mi, . . . , Mj* }, 
we introduce an encompassing model in which all these models are nested. 
We also assume the scenario of modeling epidemiological series of mortality 
or incidence counts of a certain disease, since this is the most extended ap- 
plication of Joinpoint models. Nevertheless, the main ideas that we present 
hold for other types of data, not necessarily counting data. 

Let Yi be the number of cases of a specific disease observed during a period 
of time, represented by ti. To account for the discrete nature of these values, 
we suppose that 

~ Poisson(;Uj), i = l,...,n, 
whose mean is defined as 

J* 

(3) log(^i) = log(P,) + a + /?o • (i. - t) + 5^ 5j ■ {pj ■ Br^ (i,)) , 

i=i 

where Pi is the population under study during year i, {(5^; j = 1, . . . , J*} 
are binary variables which include (for 5j = 1) or exclude (for 6j = 0) each 
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change-point in the model and tj is the unknown location of the jth break- 
point. This model can be interpreted as an encompassing model, which con- 
tains all possible joinpoint models unambiguously identified through S = 
{6i, . . . ,6j*), to explain the data. In particular, Mq = {5 = 0}, Mi = {S : 
'^j=i 6j = 1}, . . . and Mj* = {S = 1}. Hence, the posterior probability over 
the model space {Mq,Mi, . . . ,Mj*} is completely specified through the pos- 
terior distribution of S. A detailed description of the computation of 7r(d|y) is 
provided as supplementary material [Martinez-Beneito, Garcia-Donato and 
Salmeron (2011)]. The proposal in (3) can in a way be seen as an order-2 
regression spline [Hastie, Tibshirani and Friedman (2009)], as this is a piece- 
wise linear continuous regression function. 

To avoid identifiability problems, we impose a number of restrictions on 
the locations of the change-points on which there is broad consensus in the 
related literature. These are imposed so as to ensure a minimum distance 
between change-points and a restriction of order. In particular, we assume 
that the parametric space for r = {ti,T2, ■ ■ ■ ,tj*), which we call Q, is 

Q = {(ri, . . .,rj.) :ti + d < ti,ti + d < T2,T2 + d < T3, . . . ,tj* + d < 

With this assumption, there is a minimum number of periods d between 
any two change-points. By default, we use d = 2 in our applications. Notice 
that this restriction avoids the existence of two or more change-points be- 
tween consecutive observations. Other ways of implementing this restriction 
have been proposed, as, for example, in Ghosh, Basu and Tiwari (2009). 
Notice that is a bounded set in IZ'^* . 

3.2. The prior distribution. Let 7r(a, Po,S, r, f3) be the prior distribution 
of the parameters in the model (3). We express this prior as 

7r(a, /3o, /3, T, 5) = 7r(/3|a, /?o, 5, r)7r(a, /3o)vr(T)7r((5) , 

where /3 = (/3i, . . . , /3j.). 

The prior 7r(Q;,/3o) corresponds to common (with the same meaning, as 
argued in the previous section) parameters in all models. Under this con- 
dition, it is common to use a noninformative prior [see, e.g., Berger and 
Pericchi (2001); Bayarri and Garcfa-Donato (2007, 2008)], in this case the 
constant prior: 7r(a,/3o) oc 1. 

All the other parameters are not common and it is well known [Berger 
and Pericchi (2001)] that the posterior distribution is very sensitive to their 
prior distribution. In particular, noninformative (improper) or vague priors 
would produce arbitrary Bayes factors. 

Our approach to assigning 7r(/3|a, /3o) ^7 ''') is based on an approximation of 
the divergence based (DB) prior, introduced by Bayarri and Garci'a-Donato 
(2008) as a broad generalization of the pioneering ideas of Jeffreys (1961), 
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Zellner and Siow (1980) and Zellner (1984). When comparing two nested 
models, an approximation of the DB prior for the new parameters in the 
complex model (possibly reparameterized to range within the real line) is 
a heavy tailed density, centered at zero and scaled by the inverse of the 
corresponding block of the unitary Fisher information matrix evaluated in 
the simpler model. This would lead us to the proposal 

(4) 7r(^|a,/3o,<5,r) = Cauchyj.(/3|0,I]), 

the matrix S to be specified. Using a hierarchical scheme, this is equivalent 
to the proposal 

7r(/3|a, Po,S, 'T, 7) = Normalj. (/3|0, 7XI), 7 ~ Gamma" ""^(0.5, 0.5), 

where 7 acts as a "mixing" parameter. 

In our problem, it is easy to see that the block (corresponding to /3) of 
the Fisher information matrix of the encompassing model (3) evaluated at 
/9 = is X = AB*WBA, where B = {Brj{ti)} (the matrix of covariates), 
W = diagjwj} with Wi = Pi exp(a + /3i(ti — t)) and A = diag(6). Clearly, I is 
not (for every 6) a positive-definite matrix, so it cannot be used directly to 
define S above. Instead, we propose 

(5) E = n(AB*WBA + diag(B*WB - AB*WBA))~\ 

which is (for every d) a positive-definite matrix. 

As we argue below, the resulting proposed prior (4) with a scale matrix 
as in (5) has a very interesting interpretation. Given a particular S*, with, 
say, = Ist f3^ be the J-dimensional subvector of f3 which corresponds 
to the nonnull Jj's. Similarly, denote P^c as the remaining parameters in /3. 
Finally, B5 denotes the matrix with the columns in B which corresponds to 
the nonnull (5j's. Hence, it can easily be seen that and Pgc are independent, 
conditional on the mixing parameter 7. In fact, 

n{(3\a, /3o, 5 = 5*, r , 7) = Normalj (/35|0, -fniBlWBs)-^)f{Ps^ \a, /3o, r, 7), 

where 

f{j3sc\a,Po,r,j)= J] Normal(ft |0,7n[B*WB],::i). 

{i:Si=0} 

For every S, the joint prior 7r(/3|a, /3o, r) can be seen as the product of 
the (approximated) DB prior for the active parameters in the model, times 
a proper density for those inactive parameters. This proper density has no ef- 
fect on the corresponding Bayes factors, thereby acting as a pseudoprior [see, 
e.g., Carlin and Chib (1995); Han and Carlin (2001); Dellaportas, Forster 
and Ntzoufras (2002)]. Nevertheless, as noticed previously in the literature, 
these may have a great impact on the numerical results. In our experi- 
ence, partially described in the supplementary material [Martinez-Beneito, 
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Garcia-Donato and Salmeron (2011)], this particular form of the pseudoprior 
leads to quite satisfactory results. 

With this approach, the priors are in some sense defined encompassingly. 
That is, instead of using one (DB) prior for each possible submodel nested 
in (3), implicitly defined by each S, we use a single prior which contains 
all these priors. The main advantage of the resulting procedure is that it 
makes it possible (as we describe in the next section) to use standard esti- 
mation procedures (like MCMC) to easily solve the model selection problem, 
whose direct resolution usually requires the help of sophisticated numerical 
techniques. 

The proposal for ir{T) is not as delicate an issue as the prior for f3. This 
is because the corresponding parametric space is a bounded set which en- 
sures that 71(6) is proper under very mild conditions. Hence, the normalizing 
constant is unambiguously defined. In this situation, the constant prior is 
clearly a reasonable default choice: '7r(r) cx 1 for r € 0. 

Finally, we introduce our proposal for tt{S), which is deeply related to our 
prior beliefs over the model space {Mq, Mi, . . . , Mj*}. One possibility would 
be to use independent Bernoulli distributions for 5i with a probability of suc- 
cess of 0.5. Nevertheless, this would lead us to the binomial(J*, 0.5) distribu- 
tion over the model space, clearly favoring those models with around J*/2 
joinpoints and also giving an undesirable important role to the fixed value J* 
(which is usually posed as arbitrarily large). 

Instead, we experimented with two different proposals. In the first one 
(which we refer to as Bayesl), all models Mj have the same probability [i.e., 
1/(J* + 1)], this probability being equally distributed over the same number 
of joinpoints. This prior distribution can be formulated in a hierarchical way 
as 



The term in the denominator of the prior for S yields the same prior 
probability for any number of change-points in the model if = • • • = pj* = 
0.5 (the prior expected values of these terms). This hierarchical formulation, 
once the parameters p are integrated out, can also be expressed as 



The main drawback of this proposal is that, a priori, the mean number 
of joinpoints included in the model (J*/2) depends on the arbitrary quan- 
tity J*. While this is not a problem when J* is carefully assigned, it could be 



7r{6\p) oc 




7r(pi) = Beta(l/2,l/2),i = 1,...,J*. 
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an issue when this parameter is arbitrarily assigned (as is the case in many 
studies). To avoid this dependency on J* as much as possible, while keeping 
the essence of the first proposal, we alternatively explore a slight modifica- 
tion on T^i{5). The prior (to be called Bayes2) can be defined hierarchically 
as 

7r((5i|pi) =Bernouni(pi), i = l,..., J*, 

7r(pi) = Beta(l/2, (J* - l)/2), i = 1, . . . , J*. 

With this second proposal, the prior expected number of break-points 
(1 break-point) does not depend on the value of J*. 

In the same way as for Bayesl, this prior distribution can also be expressed 
in a nonhierchical way, once the pi parameters are marginalized, as 

Interestingly, as pointed out by an Associate Editor, for large J* , the prior 
probability of zero and one joinpoint tends to ~ 0.37, emphasizing the 
robustness of 7r2 to the choice of J*. 

4. Breast cancer mortality trend in Castellon province. We analyze the 
breast cancer mortality time trend in Castellon province, for the period 
1980-2007. Castellon is one of the 50 provinces that make up Spain, where 
around 285,000 women resided in 2007 and with 62.4 women dying annually 
of breast cancer, on average, during that period. Two facts have occurred in 
those years that have presumably changed breast cancer mortality trends, 
which are the progressive introduction of the Breast Cancer Screening Pro- 
gram in that province in 1992 [Vizcaino et al. (1998)], and the introduction 
of new therapies for the treatment of this disease around the world, at ap- 
proximately the same time. In fact, there is some controversy over which of 
these two factors could have a higher impact on mortality variation [Peto 
(1996); Blanks et al. (2000); Berry et al. (2005)]. In any case, we would 
expect to find at least one joinpoint on the breast cancer mortality trend 
studied. Moreover, changes in breast cancer mortality trends have already 
been described for the first half of the 1990s for other Spanish regions [As- 
cunce et al. (2007); Salmeron et al. (2009); Perez Lacasta et al. (2010)]. 

Regarding Bayesian criteria, if we focus on the mode of the posterior dis- 
tribution of the number of joinpoints, in Figure 2 it can be seen that both 
methods point to the existence of exactly one joinpoint, indeed, Bayesl 
yields a 51.0% probability of one joinpoint, while Bayes2 estimates that 
probability at 73.6%. Moreover, the probability of no joinpoint in the time 
trend for these two models is 1.9% and 3.0%, respectively. Hence, the prob- 
ability of a "simply linear" trend for breast cancer mortality is low. 
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Fig. 2. Posterior distribution of the number of joinpomts in breast cancer mortality trend 
in Castellon Province for Bayesi and Bayes2 criteria. 



On the other hand, we have used the Permutation and BIG criteria, by 
means of the NCI tool. The Permutation criterion (the one suggested by the 
NCI tool's authors) does not find any joinpoints in the period under study, 
possibly due to the limited information on the trend provided by the low 
number of deaths by year. The BIC criterion, however, finds one joinpoint 
around 1995, with a 95% confidence interval: [1990, 2005]. As a consequence, 
the conservative behavior observed of the Permutation criterion makes it 
yield results which neither agree with the remaining criteria nor with the 
results that we expected, based on further knowledge of this cause of mor- 
tality. It has to be acknowledged that the Permutation test yields a p-value 
of 0.0136 when testing 1 versus joinpoints, whereas the significance level 
of the test is 0.0125 (the result of the Bonferroni correction of the original 
level 0.05), that is, the model without any joinpoint is really close to being 
rejected. Therefore, the arguments for selecting between the models with ei- 
ther none or just one joinpoint are not conclusive at all for this example, but, 
on the contrary, the consequences derived from that selection are dramatic. 
This result warns against the danger of those criteria choosing a particular 
number of joinpoints and ignoring the uncertainty in that choice. In our ex- 
ample the permutation test would presumably have led to a wrong answer 
and from that moment on all the results from the analysis would have been 
completely missleading, as the premises of the chosen model are accepted as 
true and those from alternative models are completely ignored. 

The left-hand side of Figure 3 shows the least squares estimated trend for 
a different number of joinpoints (recall that the Permutation test chooses 
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Fig. 3. Left: Least squares estimated trends for a different number of joinpoints for the 
annual observed death rate (per 100,000 women). Right: Estimated time trend for the 
annual observed death rate (per 100,000 women) for both Bayesl and Bayes2. 



the curve with joinpoints from among all these). As can be noticed, fit- 
ted trends are quite different and the repercussion of the choice of one or 
another curve, and as a consequence of ignoring alternative models are dra- 
matic. Those consequences could be even worse in the case of temporal 
forecasting. The former figure also shows the forecasted trend for every one 
of these curves for the following 5 years (2008-2012) and, as can be appreci- 
ated, predictions for the model without joinpoints diverge completely from 
those of the models with joinpoints. As the Permutation based criteria lacks 
an estimate of the probability of these scenarios, they cannot be averaged 
and one of them has to be chosen with the previously outlined risks. This 
is not the case of our proposed methods. The right-hand side of Figure 3 
shows the estimated trend by both Bayesl and Bayes2 where predictions 
based on different numbers of joinpoints are averaged to provide a single 
answer weighting all the scenarios considered. Moreover, as pointed out in 
George (1999), the predictions derived with this Model Averaging proce- 
dure are optimal in several senses, the square error loss being one of them. 
Predicted trends for both Bayesian criteria are really close, although the 
Bayesl prediction depicts more detail than the Bayes2 as a consequence 
of the higher number of joinpoints that this model occasionally considers. 
From now on we will focus on the results of Bayesl to improve the clarity 
of the presentation. 

The estimation of (3q from Bayesl yields a posterior mean of —0.0017 
and a 95% probability interval [—0.0077,0.0042]. Consequently, the global 
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Fig. 4. Left: Joint posterior distribution of the location of joinpomts conditioned to the 
model with just 2 joinpoints. Histograms correspond to the marginal posterior distribution 
of the location of the first and second joinpoint. Right: Probability of the trend having 
described a change-point before any specific moment. 



trend during the whole period studied has a slope which is not very different 
from 0. The interest of this result is that it is valid regardless of the true 
number of joinpoints underlying the true trend, as we are not conditioning 
our results on a specific number of change-points. Conversely, the frequentist 
results condition all their conclusions on the selected model, and, therefore, 
they are valid if and only if that model agrees with the reality. 

Figure 4 (left) shows the joint posterior distribution of the location of 
these two joinpoints. As can be noticed, this figure points out the existence of 
one joinpoint from about 1993 to 1997 and another one less precisely placed 
in the range from 1983 to 2005. For this second joinpoint there are several 
places that seem to be likely to host the second joinpoint. This information 
is much richer than that provided by the least squares estimate in Figure 3 
that just points to 1987 as the most likely year for the second joinpoint. 
Focusing in the results in Figure 4 (left), our impression is that reducing 
the results of the 2-joinpoints-fit to a single curve (although it corresponds 
to the least squares fit) could be a really poor summary of all these results. 
Moreover, the marginal distribution of the locations of both joinpoints are 
multimodal and clearly asymmetric. Therefore, confidence intervals obtained 
for these quantities under asymptotic approximations will have to be treated 
with caution. 

Finally, Figure 4 (right) shows, at every moment of the period under 
study, the probability of the trend having described a change-point before 
that precise moment. As it is evident that the curve converges to 0.981, the 
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complement of the probability of not describing a joinpoint along the whole 
period of study. Moreover, from the start of 1994 the probability of already 
having described a joinpoint is higher than 50%, at the start of 1998 that 
probability is higher than 90% and we have to wait until 2002 to be sure, 
with a 95% probability, of having described at least one joinpoint. This way 
we can also measure (without any asymptotic assumptions) the degree of 
certainty that we have about the trend having described a change at any 
moment. It is possible to derive many posterior summaries of this kind with 
this approach. These will be able to answer most of the epidemiological 
questions that could be of particular interest in real studies. 

5. Conclusions. This study has introduced a novel approach to Joinpoint 
analysis, taking advantage of considering the model selection problem as an 
inference problem on an encompassing model which contains all the candi- 
date alternatives. Moreover, the possibility of carrying out its inference in 
WinBUGS is an important added value, as it makes it available to a wide 
community of users for further applications. We also think that the repa- 
rameterization made of the original problem has also been important, as it 
has made it possible to incorporate many previous model selection theory 
results. These results have given us a really valuable insight into the prior 
distributions of noncommon parameters, which is the main challenge from 
the Bayesian point of view, in order to give a reasonable answer to model 
selection problems. 

Our proposal models the observed mortality as Poisson counts. This dis- 
tributional assumption has several advantages: (i) it can cope with zero 
counts without any problem, (ii) no additional assumption has to be made 
about the variability of the observations, since it is implicitly established 
and (iii) avoids any Gaussian assumption that may not be appropriate at 
all, especially when the observed counts are lower. On the other hand, the 
location of change-points is now considered continuous in time, as in Yu 
et al. (2007), Ghosh, Basu and Tiwari (2009). In our opinion this is a more 
realistic assumption, as if a rupture point really has existed, it could have 
occurred at any moment in the period under study, regardless of whether 
we have observed counts aggregated for subintervals of that same period. 

A secondary question is to know how the Bayesian approach compares 
with the existing methods in the frequentist arena. We do so through an in- 
tensive simulation study (details provided as supplementary material [Marti- 
nez-Beneito, Garcia-Donato and Salmeron (2011)]). They are useful to know 
under what circumstances which method is expected to choose, on average 
and over replicated data sets, the "correct" model. What we found is that 
the Bayesian proposals are more sensitive (compared with the Permutation 
approach), although less specific in the detection of joinpoints. This seems 
to be an expected consequence of the known conservative behavior of the 



JOINPOINT REGRESSION WITH UNKNOWN BREAK-POINTS 



17 



Permutation method and the probabihstic essence of Bayesian approaches. 
Therefore, in a context where data may not be very informative, the use of 
Bayesian methodology should really be encouraged. As a consequence, those 
methods shown in this paper could be of interest for those disease registries 
of a moderate size, not as big as those usually used by the National Cancer 
Institute of the United States. 

As just outlined, statistical power is an issue of concern in Joinpoint 
studies. This concern is even greater when covariates are considered from 
the frequentist approach, as in those cases the original data set is usually 
split for independent analyses (one for every value of the covariate) and, 
as a consequence, the statistical power of every one of those subanalyses 
is decreased. The new proposals introduced in this paper are model-based 
and, hence, the new covariates could also be included in our model as main 
effects or as an interaction with the terms already considered in the model. 
Indeed, from the frequentist point of view some progress has been made 
toward this kind of modeling in the case of having at most one joinpoint; 
see, for example, Pollan et al. (2009). In that case we would not be forced 
to make independent data analyses for every value of the covariate and 
that way we would retain the statistical power of the original analysis that 
did not consider any covariate. Also, this kind of analysis would be the 
straightforward way to analyze age standardized rates from the methodology 
that we have just outlined. This possibility is very attractive and will be one 
of the main lines of development following this study. 

SUPPLEMENTARY MATERIAL 

Supplement Document (DOI: 10. 1214/11- AOAS471SUPP; .pdf). A sup- 
plemental document for this paper has been written containing further de- 
tails about: Performance of the proposed methods on simulated data sets. 
Calculus of the basis functions allowing the fitted trends to describe join- 
points and some remarks about Bayes factors and their computation in our 
specific setting. This document can be found at Martinez-Beneito, Garcia- 
Donato and Salmeron (2011). 
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